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In DNA, tandem repeat consists of two or more contiguous copies of a 
pattern of nucleotides. Tandem repeats of the motif are useful in many 
applications like molecular biology (related to genetic information of 
inherited diseases), forensic medicines, DNA fingerprinting and molecular 
markers for cancer. Various researchers designed formal models and 


grammars to identify two contiguous copies of the pattern. Tree-adjoining 


grammar cannot be designed for k-copy language. There is a need to design a 
Keyword: formal model which will work for more than two contiguous copies of the 


Deep pushdown automata pattern. In this paper, we have designed deep pushdown automata for k- 


DNA continuous copies of the pattern for k22 . The proposed formal model will 


Formal grammar also identify the tandem repeats without specifying the pattern and its size. 
K-copy language 
Tandem repeats Copyright © 2018 Institute of Advanced Engineering and Science. 
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1. INTRODUCTION 

Deoxyribonucleic acid (DNA) is a nucleic acid which consists of genetic instructions used in the 
development and functioning of all living organisms and viruses. Tandem repeats in DNA consist of two or 
more contiguous copies of a pattern of nucleotides. Repeating patterns are also known as motifs. 
The motif can occur in different lengths and repetitions can be exact or approximate copies. Repeats of the 
motif are classified into short tandem repeats or microsatellites (length 10 or shorter) and minisatellites 
(repeats of 10-60 nucleotides) [1],[2]. Lalioti et al. [3] and Wren et al. [4] observed that the repeats in DNA 
sequence associated with the neurological disorder. Huang et al. [5], Richard et al. [6] and Mcmurray [7] 
investigated the repeat of tandem repeats play a significant role in the formation of hairpin structures. 
Motivated by the applications of tandem repeats in DNA sequence in the area of molecular biology, 
orensic medicines, DNA fingerprinting and molecular markers for cancer [8]-[11], in this paper we have 
designed deep pushdown automata for k-copy language. 


—sy* i 
Related work: The K-copy language can be described by K copy {x" |x€10, DY. 


Various researchers carried out work to represent tandem repeats using formal grammar, but the major 
limitation of their work is that using their formal grammar, we can able to recognize only 


L, ={ww|we {a,b} } and their grammar cannot generate the languages for WW”, WWWW and so on. 


L={www|we {a,b} | cannot be generated by tree adjoining grammar [12]. Kalra and Kumar [13], [14] 


introduced the concept of fuzzy deep pushdown automata and deterministic deep pushdown automata. 
Kalra and Kumar [15] designed the state grammar and deep pushdown automata for Tandem repeats, 
inverted repeats and interleaved repeats. This proposed approach work for a subset of tandem repeat motif. 
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Inspired by various applications of k-copy language, a generalized deep pushdown automaton has been 
designed for tandem repeat motif. 

The paper is organized as follows: In Section 2, some preliminaries concept of DNA, tandem 
repeats and deep pushdown automata are given. Section 3 consists of deep pushdown automata for k-copy 
languages and tandem repeats followed by conclusions in section 4. 


2. PRELIMINARIES 


Let R= [an 8.0.4) denotes the DNA alphabet. Purines are classified into guanine (& ) and adenine @ 
, whereas pyrimidines are classified into uracil (“ ), thymine (4) and cytosine (©). Pairing occurs between 
pyrimidines and purines. The complement of a symbol “ is represented by a In DNA, pai eae ae 
and ! =4, Deep pushdown automaton is a formal model to represent cross-dependencies in natural and 
formal languages. It is a counterpart of state grammar. 


(Q, py T, S, S, R, F) 


Def. 1: A deep pushdown automaton [16] is a septuple where @ ig a finite set 


of states, 2 is an alphabet, Tis a set of stack symbol such that xeP , Sis a start state, Sisa starting 


RC(NxQx(P-(ZU{#}))x Ox -{#})* U 
FCO tere ET -2 ica special 


symbol. The configuration of the deep pushdown automaton is represented by OD TATED : 
In this paper, we have made following changes to the original definition of the deep pushdown 
automata: 


pushdown symbol, is a transition relation defined by 


(N x Ox{#}xOx(T —{#}) {#}) 


and F’ isa set of final states such that 


Transition relation RK is defined by RW AOXEXT SUE) A OXM 18h) 2101) 
U ({N} x OxZ x {#} xO x(T —{#}) {#}x {0,1}) U0} x OxZxZx Ox{A}x{0U) 
({0}xQxXxzXxQx{A}x {0,1} 
the top of the stack. Here id represent whether the R/W head remains stationary or point to the symbol on 
the input tape. Terminal symbol presented on top of the stack are considered as of depth 0, 


whereas Non-terminals presented on the stack are considered as of depth 1, 2, 3 and so on. 
We have explicitly represented the symbol reading from R/W head. 


The sub-relation explicitly represents pop of terminal symbol from 


Example 1: Deep pushdown automata for the language L={a"b"c"d™ |n,m20} 
2M = ({4os > 922932440}, (4, b, c,d}, {A, S, a, b, c,d, #}, qo, S, {9,},R) 
defined by 

(1, qo, a, S) + (q,, AA, 0) 

(1, g,, a, A) > (q,, Ac, 1) 

(2, q,,b, A) > (q, Ad, 1) 

(2, q,,b, A) > (q,, Ad, 1) 

(L, q,¢, A) > (q;, 2, 0) 

(0, 43, €, €) >(q,, 4, D) 

(1, q;, d, A) > (q,, 2, 9) 

(0, q,,4,d) > (q,,4,)) 

(0, g,, A, #) > (q,, #, 1) 

For input string w = aabbbccddd 
(qy, aabbbccddd, S #) (1(q,, aabbbccddd, AA#) \(q,, abbbccddd, AcA#) 1(q,, bbbccddd, AccA#) 
(q,, bbcecddd, AccAd #) (\(q,, becddd, AccAdd #) 1(q,, ccddd, AccAddd #) (q,, ccddd, ccAddd #) 
(q,, cddd, cAddd #) (\(q,, ddd, Addd #) |1(q,, ddd, ddd #) 1(q,, dd, dd #) i(q,, d, d#) H(q,,A, #) 

(q,A, #) 


and the transition relation F is 
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(V, Q, xy S, P), 


Def. 2: A state grammar is a quintuple where V isa total alphabet, Q is a finite set 


of states, is an alphabet, also known terminal symbol such that XCV, SEV—-T ig start symbol and P isa 


finite relation such that POOKY 2) (Ox) ‘ 


3. DEEP PUSHDOWN AUTOMATA FOR TANDEM REPEATS 
In this section, we will design deep pushdown automata for k-copy language and tandem 
repeats in DNA. 


3.1. Deep Pushdown Automata for K-Copy Language 

Figure | represents the deep pushdown automata for k-copy language. In deep pushdown automata, 
the element can be pushed on to a deeper part of the stack also. Deep pushdown automata represented in 
Figure | is of depth 2, which means that at a particular point of time, two topmost non-terminals can be 
expanded. Deep Pushdown automaton for k-copy language iS defined by 


2M =({Ppo; Pi> Ps Pas Pas P;}; {a,b}, {S,A, B,a,b,#}, py, S, {q,}.R) 

defined by: 
(1, Py. a, S) > (py, AB, 0) 
(1, p., S) > (po, AB , 0) 
(2, Py,a, B) > (py aB, 
(2, Po, B) > (pp, DB, I) 
(L, py, a, A) > (p,, 4, 0) 
(1, 2o,b, A) > ( p,,4, 9) 
(0, p,,4, 4) > (p,,4, ) 
(0, p,.b, b) > (pp, 2,1) 
(Lp,,b,B) —>(p,,BC , 0) 
(, p,.a, B) >(p,, BC , 0) 
(0, p3,a, B) > (p,,A, 0) 
(0, p3,b, B) > (p,,A, 0) 
(0, p3, a, 4) > (p,,A, 0) 
(0, p;,b,b) >(p,,4,9) 
(2, p,,a,C ) > (p3,aC, 1) 


where the transition relation F is 


La, S|AB,0 
1b, S| AB,O 
2,a, B|aB,1 
2,b,B| DB, | 


1b, B| BC ,0 
la, B|BC,0 


0,a, B| 2,0 
0,5, B|A,0 


0,a,a|A,1 0,a,a|A,0 


2,a,C|aC,1 
2,b,C|bC.1 


0,b,b|2,0 


Figure 1. Deep pushdown automata for k-copy language 
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(2, p,,.b, C) > (p,,bC,1) 

(0, p,,b, b) > (p,,4,1) 

(0, p,,a, a) > (p,,4, ) 

(0, p,,$, B) > (p,,A, 0) 

(0, p,,$, C) > (p,,4, 0) 

Derivation of the string, w = ababab. 

(py, ababab, S) \(p,, ababab, AB) (py, babab, AaB) (py), abab, AabB) (i(p,, abab, abB) 
(p;, abab, abBC) (p,, abab, bBC) H(p,, bab, bBaC) (p,, bab, BaC) (p,;, ab, BabC) 
(p,, ab, abC) (p,, b, bC) (pz, A, C) Op; A, ©) 


3.2. Deep Pushdown Automata for Tandem Repeats in DNA 
The transition diagram of deep pushdown automata for tandem repeats is shown in Figure 2. It is 


defined by »M =({Pp, P\> P2> P3> Pas Py }, {a, b},(S, A, B, a, b,#}, Po S, {q,},R) where the transition 


relation F is defined by: 


Transition from 7° to Po 
(1, Po» 8S ) > (Po, AB, 0) 
(1, Po.c, S ) > (py, AB, 0) 
(1, Ppa, S) > (py, AB, 0) 
(1, Pot, S) > (py, AB, 0) 

(2, Py 8B) > (Py, 8B, D) 
(2, Py.¢, B) > (py.cB, D 
(2, Py,d, B) > (py, aB, 1) 
(2, Py.t,B) > (py. tB , 1) 


Transition from to ? 
(1, Pp» 8,A) > (p,,4, 0) 
(L, Po.¢, A) > (p,,4, 0) 
(L p).a, A) > (p,,4, 0) 
(L, py.t, A) > (p,,4, 0) 
Transition from ?! to ?3 
(1, p,.t, B) > (p,, BC, 0) 
(, p,,a, B) > (p,, BC, 0) 
(1, p,.c, B) > (p;, BC, 0) 
(1, p,, 8, B) > (p;, BC, 0) 
Transition from 73 to 
(0, p,,8,B) >(p,,4, 0) 
(0, p3,c, B) > (p,,4, 0) 
(0, p;,a, B) > (p,, 4, 0) 
(0, p;,t, B) > (p,,4, 0) 
Transition from 73 to 74 
(0, P;,8, 8) > (p,, 4, 0) 
(0, p3,€, Cc) > (py, 4, 0) 
(0, p3,4, a) > (p,, 2, 0) 
(0, p3,t, t) > (py, 2, 0) 


Transition from ?4 to ?3 
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(2, py.t, C) > (pj, tC, D) 
(2, Py,a, C) > (p;, aC, 1) 
(2, p,,c, C) > (p;, cC, I) 
(2, py, 8, C) > (p, gC, 1) 
Transition from ?' to ?2 
(0, p,,1, 1) > (pz, 4,1) 

(0, p,.4, 2) > (p7, 4, ) 
(0, p,.¢,€) > (p32, 4, D 
(0, P,,8, 8) >(P2,4,) 
Transition from 2 to 
(0, P,,8, 8) > (P,,4,) 


(0, p25, c) => (P, A, 1) 
(0, p,,4, a) > (Pp; a ; 1) 
(0, p,,t, 0) >(p,,4,)) 
Transition from ?2 to Ps 
(0, p,,A, C) > (p,, 4,0) 
(0, p,,A, B) > (p,,4,9) 
1, g,5| AB, 
1c, S| AB, 0 
1a, S| AB,0 
1,1, S| AB,0 
2,8,B| gB,1 
2,c, B|cB,1 
2,a, B| aB,1 
2,t,B|tB,1 1,1, B] BC,0 
1a, B| BC, 0 
sha ' 1c, B| BC,0 
1a, A|A,0 1g, B| BC,0 


1,1, A] 2,0 
ae 0.g,B|2,0 
Gale 0,¢,B]4,0 2,1,C|1C,1 0,g,g|4,0 
; 0,4,B|4,0 9 4,ClaC,1 0,c.c] 4,0 


018140 96 Cle] 


0,A,C|4,0 (+) 
0,4, B|A,0 UW 
0.g,g|A.1 
0,c,c|A,1 
0,a,a|A,1 


0,t,t|A,1 


0,a,a|A,0 
0,t,t|A,0 


Figure 2. Deep pushdown automata for tandem repeats of DNA 


Derivation of the input string We BUERGIR a 


(Po. gatgatgatA, S#) \\(p,, gatgatgatA, AB#) (\(p,, atgatgatA, AgB#) 


(I(py, tgatgatA, AgaB#) 


(Dp. gatgatA, AgatB#) L(p,, gatgatA, gatB#) (p,, gatgatA, gatBC #) (p,, gatgatA, atBC#) 
(p;, atgatA, atBgC #) (p,, atgatA, tBgC #) Li(p,, tgatA, tBgaC #) Li(p,, tgatA, BgaC #) 


(p,, gatA, BgatC #) \(p,, gatA, gatC #) (p,, ata, atC #) U(p,, ta, tC #) U(p,, A, C #) | (Py> A, #) 
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4. CONCLUSION 
In this paper, we will design the deep pushdown automata for tandem repeats. The designed deep 
pushdown automata will work for k-copy language. The major advantage of the proposed approach over the 


existing approach is that it will work for the languages Deeg WEEE gee nese 


direction in which work can be carried out in the near future: 

1. Parsing of k-copy language and tandem repeats. 

2. Design of a tool for identifying tandem repeat pattern in DNA sequence. 

3. This model can be extended to k-approximate tandem repeats and multiple length tandem repeats. 


*. Following are the research 
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